Gesture Recognition

Designing a gesture recognition system that can understand a user’s intentions from hand or arm movements will allow people to interact with any device that has a camera without the use of a keyboard or mice. One of the problems with traditional gesture recognition systems, is that they rely on special equipment, such as gloves, stereo cameras, or depth sensors. For this reason, these systems cannot be used in laptops, desktops, or smartphones that don’t have the any of the required special equipment.

TwentyBN has been able to build a robust gesture recognition system that can detect and recognize 25 hand gestures in real-time just using a regular webcam, eliminating the need for any special equipment. Their approach consisted on training an End-to-End 3D CNN on a very large, annotated dataset of hand gestures, known as the Jester dataset. You can see a demo of their system below:

In the following lessons we will take a closer look at the approach used by TwentyBN to create their gesture recognition system. Let’s start by understanding what End-to-End learning means in the next lesson.

Next Concept